17-3 Digit Recognition: Varying MFCC Dimensions (あr恁GMFCC)

Old Chinese version

In the previous section, we have demonstrated how to use HTK for Mandarin digit recognition. In this and the following sections, we shall change various settings (such as acoustic features, acoustic model configuration, etc) to improve the recognition rates.

For modularity, we have packed the basic training and test programs into an m-file function htkTrainTest.m. This function takes a structure variable that specifies all the parameters for training, and generates the final test results.

If we keep the configuration of the acoustic models, we can still change the acoustic features. In the previous section, we used a feature type of 13-dimensional MFCC_E. We can now change it to 26-dimensional MFCC_E_D or MFCC_E_D_Z. Furthermore, we can change it to 39-dimensional MFCC_E_D_A or MFCC_E_D_A_Z. For simplicity, we have use the string representations for various feature types, as explained next.

The following exmaple uses 26-dimensional MFCC_E_D_Z for recognition:

Example 1: htk/chineseDigitRecog/training/goSyl26.mhtkPrm=htkParamSet; htkPrm.pamFile='digitSyl.pam'; htkPrm.feaCfgFile='mfcc26.cfg'; htkPrm.feaType='MFCC_E_D_Z'; htkPrm.feaDim=26; htkPrm.streamWidth=[26]; disp(htkPrm) [trainRR, testRR]=htkTrainTest(htkPrm); fprintf('Inside test = %g%%, outside test = %g%%\n', trainRR, testRR); pamFile: 'digitSyl.pam' feaCfgFile: 'mfcc26.cfg' waveDir: '..\waveFile' sylMlfFile: 'digitSyl.mlf' phoneMlfFile: 'digitSylPhone.mlf' mnlFile: 'digitSyl.mnl' grammarFile: 'digit.grammar' feaType: 'MFCC_E_D_Z' feaDim: 26 mixtureNum: 3 stateNum: 3 streamWidth: 26 Pruning-Off Pruning-Off Pruning-Off Pruning-Off Pruning-Off Inside test = 91.29%, outside test = 92.86%

The corresponding batch file is goSyl26.bat.

Furthermore, the following example uses 39-dimensional MFCC_E_D_A_Z:

Example 2: htk/chineseDigitRecog/training/goSyl39.mhtkPrm=htkParamSet; htkPrm.pamFile='digitSyl.pam'; htkPrm.feaCfgFile='mfcc39.cfg'; htkPrm.feaType='MFCC_E_D_A_Z'; htkPrm.feaDim=39; htkPrm.streamWidth=[39]; disp(htkPrm) [trainRR, testRR]=htkTrainTest(htkPrm); fprintf('Inside test = %g%%, outside test = %g%%\n', trainRR, testRR); pamFile: 'digitSyl.pam' feaCfgFile: 'mfcc39.cfg' waveDir: '..\waveFile' sylMlfFile: 'digitSyl.mlf' phoneMlfFile: 'digitSylPhone.mlf' mnlFile: 'digitSyl.mnl' grammarFile: 'digit.grammar' feaType: 'MFCC_E_D_A_Z' feaDim: 39 mixtureNum: 3 stateNum: 3 streamWidth: 39 Pruning-Off Pruning-Off Pruning-Off Pruning-Off Pruning-Off Inside test = 91.07%, outside test = 92.86%

The corresponding batch file is goSyl39.bat.

In the batch files, since we have not pack them into functions, the contents of batch files seem more complicated. But in fact, from goSyl13.bat to goSyl26.bat, only two lines have been changed. You can use the following command to verify their difference:

fc goSyl13.bat goSyl26.bat
Similarly, you can use the same method to verify the difference between goSyl26.bat and goSyl39.bat.
Audio Signal Processing and Recognition (音訊處理與辨識)